Log-Linear Models

نویسنده

  • Noah A. Smith
چکیده

This is yet another introduction to log-linear (“maximum entropy”) models for NLP practitioners, in the spirit of Berger (1996) and Ratnaparkhi (1997b). The derivations here are similar to Berger’s, but more details are filled in and some errors are corrected. I do not address iterative scaling (Darroch and Ratcliff, 1972), but rather give derivations of the gradient and Hessian of the dual objective function (conditional likelihood). Note: This is a draft; please contact the author if you have comments, and do not cite or circulate this document. 1 Log-linear Models Log-linear models1 have become a widely-used tool in NLP classification tasks (Berger et al., 1996; Ratnaparkhi, 1998). Log-linear models assign joint probabilities to observation/label pairs (x, y) ∈ X× Y as follows: Pr ~ θ (x, y) = exp ( ~ θ · ~ f(x, y) ) ∑ x′,y′ exp ( ~ θ · ~ f(x′, y′) ) (1) where ~ θ is a R-valued vector of feature weights and ~ f is a function that maps pairs (x, y) to a nonnegative R-valued feature vector. These features can take on any form; in particular, unlike directed, generative models (like HMMs and PCFGs), the features may overlap, predicting parts of the data more than once.2 Each feature has an associated θi, which is called its weight. Maximum likelihood parameter estimation (training) for such a model, with a set of labeled examples, amounts to solving the following optimization problem. Let {(x1, y∗ 1), (x2, y∗ 2), ..., (xm, y∗ m)} ∗This document is a revised version of portions of the author’s 2004 thesis research proposal, “Discovering grammatical structure in unannotated text: implicit negative evidence and dynamic feature selection.” Such models have many names, including maximum-entropy models, exponential models, and Gibbs models; Markov random fields are structured log-linear models, conditional random fields (Lafferty et al., 2001) are Markov random fields with a specific training criterion. The ability to handle arbitrary, overlapping features is an important advantage that log-linear models have over directed generative models (like HMMs and PCFGs). Importantly, for the latter type of models, maximum likelihood estimation is not straightforward when complicated feature dependencies are introduced that cannot be described as a series of generative steps (Abney, 1997). This comparison will be more fully explored in my thesis.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Monitoring Multinomial Logit Profiles via Log-Linear Models (Quality Engineering Conference Paper)

In certain statistical process control applications, quality of a process or product can be characterized by a function commonly referred to as profile. Some of the potential applications of profile monitoring are cases where quality characteristic of interest is modelled using binary,multinomial or ordinal variables. In this paper, profiles with multinomial response are studied. For this purpo...

متن کامل

Determinants of Inflation in Selected Countries

This paper focuses on developing models to study influential factors on the inflation rate for a panel of available countries in the World Bank data base during 2008-2012‎. ‎For this purpose‎, Random effect log-linear and Ordinal logistic models are used for the analysis of continuous and categorical inflation rate variables‎. ‎As the original inflation rate response to variables shows an appar...

متن کامل

A Convergence Analysis of Log-Linear Training

Log-linear models are widely used probability models for statistical pattern recognition. Typically, log-linear models are trained according to a convex criterion. In recent years, the interest in log-linear models has greatly increased. The optimization of log-linear model parameters is costly and therefore an important topic, in particular for large-scale applications. Different optimization ...

متن کامل

Log-linear modelling

Log-linear analysis has become a widely used method for the analysis of multivariate frequency tables obtained by crossclassifying sets of nominal, ordinal, or discrete interval level variables. Examples of textbooks discussing categorical data analysis by means of log-linear models are [4], [2], [14], [15], [16], and [27]. We start by introducing the standard hierarchical log-linear modelling ...

متن کامل

Estimating infectious diseases incidence: validity of capture-recapture analysis and truncated models for incomplete count data.

Capture-recapture analysis has been used to evaluate infectious disease surveillance. Violation of the underlying assumptions can jeopardize the validity of the capture-recapture estimates and a tool is needed for cross-validation. We re-examined 19 datasets of log-linear model capture-recapture studies on infectious disease incidence using three truncated models for incomplete count data as al...

متن کامل

Discriminative adaptation for log-linear acoustic models

Log-linear models have recently been used in acoustic modeling for speech recognition systems. This has been motivated by competitive results compared to systems based on Gaussian models, and a more direct parametrisation of the posterior model. To competitively use log-linear models for speech recognition, important methods, such as speaker adaptation, have to be reformulated in a log-linear f...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004